Search Result

Select

Image text retrieval method based on feature enhancement and semantic correlation matching

Jia CHEN, Hong ZHANG

Journal of Computer Applications 2024, 44 (1): 16-23. DOI: 10.11772/j.issn.1001-9081.2023060766

Abstract （240）

HTML （8）

PDF （1434KB）（237）

Save

In order to achieve the precise semantic correlation between image and text， an image text retrieval method based on Feature Enhancement and Semantic Correlation Matching （FESCM） was proposed. Firstly， through the feature enhancement representation module， the multi-head self-attention mechanism was introduced to enhance image region features and text word features to reduce the interference of redundant information to alignment of image region and text word. Secondly， the semantic correlation matching module was used to not only capture the corresponding correlation between locally significant objects by local matching， but also incorporate the image background information into the global image features and achieve accurate global semantic correlation by global matching. Finally， the local matching scores and global matching scores were used to obtain the final matching scores of images and texts. The experimental results show that the FESCM-based image text retrieval method improves the recall sum over the extended visual semantic embedding method by 5.7 and 7.5 percentage points on Flickr8k and Flickr30k benchmark datasets， respectively； the recall sum is improved by 3.7 percentage points over the Two-Stream Hierarchical Similarity Reasoning method on the MS-COCO dataset. The proposed method can effectively improve the accuracy of image text retrieval and realize the semantic connection between image and text.

Table and Figures | Reference | Related Articles | Metrics

Select

Stomach cancer image segmentation method based on EfficientNetV2 and object-contextual representation

Di ZHOU, Zili ZHANG, Jia CHEN, Xinrong HU, Ruhan HE, Jun ZHANG

Journal of Computer Applications 2023, 43 (9): 2955-2962. DOI: 10.11772/j.issn.1001-9081.2022081159

Abstract （375）

HTML （19）

PDF （4902KB）（200）

Save

In view of the problems that the upsampling process of U-Net is easy to lose details， and the datasets of stomach cancer pathological image are generally small， which tends to lead to over-fitting， an automatic segmentation model for pathological images of stomach cancer based on improved U-Net was proposed， namely EOU-Net. In EOU-Net， based on the existing U-Net model， EfficientNetV2 was used as the backbone， thereby enhancing the feature extraction ability of the network encoder. In the decoding stage， the relations between cell pixels were explored on the basis of Object-Contextual Representation （OCR）， and the improved OCR module was used to solve the loss problem of the upsampled image details. Then， the post-processing of Test Time Augmentation （TTA） was used to predict the images obtained by rollover and rotations at different angles of the input image respectively， and then the prediction results of these images were combined by feature fusion to further optimize the output results of the network， thereby solving the problem of small medical datasets effectively. Experimental results on datasets SEED， BOT and PASCAL VOC 2012 show that the Mean Intersection over Union （MIoU） of EOU-Net is improved by 1.8， 0.6 and 4.5 percentage points respectively compared with that of OCRNet. It can be seen that EOU-Net can obtain more accurate segmentation results of stomach cancer images.

Table and Figures | Reference | Related Articles | Metrics

Select

Review of object pose estimation in RGB images based on deep learning

Yi WANG, Jie XIE, Jia CHENG, Liwei DOU

Journal of Computer Applications 2023, 43 (8): 2546-2555. DOI: 10.11772/j.issn.1001-9081.2022071022

Abstract （647）

HTML （27）

PDF （858KB）（461）

Save

6 Degree of Freedom （DoF） pose estimation is a key technology in computer vision and robotics， and has become a crucial task in the fields such as robot operation， automatic driving， augmented reality by estimating 6 DoF pose of an object from a given input image， that is， 3 DoF translation and 3 DoF rotation. Firstly， the concept of 6 DoF pose and the problems of traditional methods based on feature point correspondence， template matching， and three-dimensional feature descriptors were introduced. Then， the current mainstream 6 DoF pose estimation algorithms based on deep learning were introduced in detail from different angles of feature correspondence-based， pixel voting-based， regression-based and multi-object instances-oriented， synthesis data-oriented， and category level-oriented. At the same time， the datasets and evaluation indicators commonly used in pose estimation were summarized and sorted out， and some algorithms were evaluated experimentally to show their performance. Finally， the challenges and the key research directions in the future of pose estimation were given.

Table and Figures | Reference | Related Articles | Metrics

Select

Robust joint modeling and optimization method for visual manipulators

Xianbojun FAN, Lijia CHEN, Shen LI, Chenlu WANG, Min WANG, Zan WANG, Mingguo LIU

Journal of Computer Applications 2023, 43 (3): 962-971. DOI: 10.11772/j.issn.1001-9081.2022010037

Abstract （258）

HTML （1）

PDF （6333KB）（188）

Save

To address the problems of low accuracy， difficult deployment and high calibration cost of visual manipulator in complex system environments， a robust joint modelling and optimization method for visual manipulators was proposed. Firstly， the subsystem models of the visual manipulator were integrated together， and the sample data such as servo motor rotation angles and manipulator end-effector coordinates were collected randomly in the workspace of the manipulator. Then， an Adaptive Multiple-Elites-guided Composite Differential Evolution algorithm with shift mechanism and Layered Optimization mechanism （AMECoDEs-LO） was proposed. Simultaneous optimization of the joint system parameters was completed by using the method of parameter identification. Principal Component Analysis （PCA） was performed by AMECoDEs-LO on stage data in the population， and with the idea of parameter dimensionality reduction， an implicit guidance for convergence accuracy and speed was realized. Experimental results show that under the cooperation of AMECoDEs-LO and the joint system model， the visual manipulator does not require additional instruments during calibration， achieving fast deployment and a 60% improvement in average accuracy compared to the conventional method. In the cases of broken manipulator linkages， reduced servo motor accuracy and increased camera positioning noise， the system still maintains high accuracy， which verifies the robustness of the proposed method.

Table and Figures | Reference | Related Articles | Metrics

Select

Chinese-Vietnamese pseudo-parallel corpus generation based on monolingual language model

JIA Chengxun, LAI Hua, YU Zhengtao, WEN Yonghua, YU Zhiqiang

Journal of Computer Applications 2021, 41 (6): 1652-1658. DOI: 10.11772/j.issn.1001-9081.2020071017

Abstract （330）

PDF （1333KB）（303）

Save

Neural machine translation achieves good translation results on resource-rich languages, but due to data scarcity, it performs poorly on low-resource language pairs such as Chinese-Vietnamese. At present, one of the most effective ways to alleviate this problem is to use existing resources to generate pseudo-parallel data. Considering the availability of monolingual data, based on the back-translation method, firstly the language model trained by a large amount of monolingual data was fused with the neural machine translation model. Then, the language features were integrated into the language model in the back-translation process to generate more standardized and better quality pseudo-parallel data. Finally, the generated corpus was added to the original small-scale corpus to train the final translation model. Experimental results on the Chinese-Vietnamese translation tasks show that compared with the ordinary back-translation methods, the Chinese-Vietnamese neural machine translation has the BiLingual Evaluation Understudy (BLEU) value improved by 1.41 percentage points by fusing the pseudo-parallel data generated by the language model.

Reference | Related Articles | Metrics

Select

Rail tread block defects detection method based on improved Faster R-CNN

LUO Hui, JIA Chen, LU Chunyu, LI Jian

Journal of Computer Applications 2021, 41 (3): 904-910. DOI: 10.11772/j.issn.1001-9081.2020060759

Abstract （404）

PDF （1562KB）（706）

Save

Concerning the problems of large scale change and small sample dataset in rail tread block defects, a rail tread block defects detection method based on improved Faster Region-based Convolutional Neural Network (Faster R-CNN) was proposed. Firstly, based on the basic network structure of ResNet-101, a multi-scale Feature Pyramid Network (FPN) was constructed to achieve the fusion of deep and shallow feature information in order to improve the detection accuracy of small-scale defects. Secondly, the Generalized Intersection over Union (GIoU) loss was used to solve the problem of insensitivity to the position of the predicted border caused by regression loss SmoothL1 in Faster R-CNN. Finally, a method of Region Proposal Network by Guided Anchoring (GA-RPN) was proposed to solve the problem of the imbalance of positive and negative samples in the training of the detection network due to the large redundancy of anchor points generated by Region Proposal Network (RPN). During the training process, the RSSDs dataset was expanded based on image preprocessing methods such as flipping, cropping and adding noise to solve the problem of insufficient training samples of rail tread block defects. Experimental results show that the mean Average Precision (mAP) of the rail tread block defects detection based on the proposed improved method can reach 82.466%, which is increased by 13.201 percentage points compared with Faster R-CNN, so that the rail tread block defects can be detected accurately by the proposed method.

Reference | Related Articles | Metrics

Select

New approach to retinal image enhancement based on Hessian matrix

YOU Jia CHEN Bo

Journal of Computer Applications 2011, 31 (06): 1560-1562. DOI: 10.3724/SP.J.1087.2011.01560

Abstract （1264）

PDF （461KB）（755）

Save

The retina vessel enhancement of retina angiography image is considered as an essential factor to improve diagnosis. Therefore, this paper proposed a multi-scale vessel enhancement method for retinal image based on Hessian matrix. Besides, the strategy and process of applying this method were provided. This method was tested on DRIVE database and its results were compared with other methods using the same database. The result shows a considerable accuracy as other methods while obtaining better robustness.